Error estimates for solid-state density-functional theory predictions: an 
overview by means of the ground-state elemental crystals 
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Predictions of observable properties by density-functional theory calculations (DFT) are used in- 
2 creasingly often by experimental condensed-matter physicists and materials engineers as data. These 

Q predictions are used to analyze recent measurements, or to plan future experiments in a rational 
way. Increasingly more experimental scientists in these fields therefore face the natural question: 
QiQ what is the expected error for such a first-principles prediction? Information and experience about 

( this question is implicitly available in the computational community, scattered over two decades 

of literature. The present review aims to summarize and quantify this implicit knowledge. This 
' ^ ^ ' eventually leads to a practical protocol that allows any scientist - experimental or theoretical - to 

^ determine justifiable error estimates for many basic property predictions, without having to perform 

additional DFT calculations. 

,— H A central role is played by a large and diverse test set of crystalline solids, containing all ground- 

^ state elemental crystals (except most lanthanides) . For several properties of each crystal, the differ- 

Sence between DFT results and experimental values is assessed. We discuss trends in these deviations 
. and review explanations suggested in the literature. 

A prerequisite for such an error analysis is that different implementations of the same first- 
—I principles formalism provide the same predictions. Therefore, the reproducibility of predictions 

across several mainstream methods and codes is discussed too. A quality factor A expresses the 
spread in predictions from two distinct DFT implementations by a single number. To compare the 
PAW method to the highly accurate APW-|-lo approach, a code assessment of VASP and GPAW 
(PAW) with respect to WIEN2k (APW-l-lo) yields A-values of 1.8 and 3.3meV/atom, respectively. 
2 In both cases the PAW potentials recommended by the respective codes have been used. These 

' ' differences are an order of magnitude smaller than the typical difference with experiment, and 

therefore predictions by APW+lo and PAW are for practical purposes identical. 
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I. INTRODUCTION 

Density-functional theory [H [5] (DFT) remains 
one of the most popular methods to treat both 
model systems and realistic materials in a quan- 
tum mechanical way [SHH] . In condensed-matter 
physics, DFT is not only used to understand 
the observed behavior of solids, but increasingly 
more to predict characteristics of compounds 
that have not yet been determined experimen- 
tally (for example in Refs. [5 HTT|) . In both cases 
the first-principles results provide a point of ref- 
erence, either to analyze data from measure- 
ments or to plan future experiments. It is there- 
fore essential to have a quantitative idea of the 
expected deviation between a DFT prediction 
of a certain property and the corresponding ex- 
perimental value. Error estimates are routinely 
provided in experimental physics, but in DFT 
applications this is much less common practice. 
When confronted with a disagreement between 
theory and experiment, one usually resorts to 
higher-order levels of theory instead [HHH]- 
DFT as such is an exact reformulation of quan- 
tum physics, and does not involve any approxi- 
mation. From a purely theoretical point of view, 
it should lead to exact predictions, with no need 
for an error estimate. In practice, however, 
one requires an educated guess for an essen- 
tial ingredient of DFT: the exchange-correlation 
functional (hereafter referred to as 'functional'). 
Apart from this main approximation, there are 
some other features that go beyond DFT in the 
way it is usually applied, such as the failure of 
the Born-Oppenheimer approximation |151 116j 
or high-Z radiative corrections from quantum 
electrodynamics [T71[TH]. They affect the results 
to some extent as well, but they are generally 
much less important than the particular choice 
of the exchange-correlation functional. For any 
of these choices, DFT predictions will not agree 
perfectly with experimental observations. De- 
viations of this kind will be referred to as the 
intrinsic error for a particular functional. 

Regardless of the difference with experiment, 
however, all predictions should be independent 
of how the DFT (Kohn-Sham) equations are 
solved numerically: each of the many available 
DFT implementations should give identical re- 
sults for the same functional. In reality, there 
will be some scatter in the predictions of differ- 
ent codes [ini Unj J as each of them introduces 
a distinct amount of numerical noise. This sec- 
ond source of fluctuations leads to a numerical 
error. It is therefore also important to assess 



to what extent predicted properties vary among 
DFT approaches. 

The goal of this work is to quantify the knowl- 
edge about these two kinds of DFT errors, in- 
trinsic and numerical ones, and — where rele- 
vant — to review the physics behind them. 
A way to obtain insight into computational er- 
rors is by means of benchmark studies, exam- 
ining the performance of different implementa- 
tions and functionals for a large set of materials 
and properties. For molecular benchmark sets 
the intrinsic errors have already been assessed in 
great detail, often with the aim of selecting the 
best functional for a particular property (for ex- 
ample in Refs. Uni [3TH1SI). Similar studies exist 
for sohd-state DFT as well [24H39]. but they are 
mostly limited to a small number of properties 
and/or compounds. In addition, their focus is 
often on understanding the differences between 
functionals, so they do not lead to quantita- 
tive and universally applicable error estimates 
with respect to experiment. A benchmark that 
is really comprehensive, should meet two cri- 
teria: the number of elements that is included 
in the test set should be sufficiently large, and 
the crystal structures in the set should be suf- 
ficiently diverse. This guarantees the transfer- 
ability of the benchmark conclusions with re- 
spect to both the intrinsic and the numerical 
errors. 

A natural choice to construct such an exten- 
sive test set emerges from the periodic table of 
elements. By taking the ground-state crystal 
structures of all elements [ID], the two crite- 
ria for a comprehensive solid-state benchmark 
set are simultaneously fulfilled. All elements 
are included, thus trivially fulfilling the first re- 
quirement. In addition, the corresponding crys- 
tal structures range from simple hexagonal and 
cubic configurations to low-symmetry geome- 
tries, like orthorhombic and monoclinic cells. 
Of course, extrapolating the obtained insights 
to more complex materials, such as multicom- 
ponent compounds, requires some care. It is 
not impossible, however, as will be shown for 



example in Sec. IIIB 



An additional advantage of using all elemental 
crystals follows from the periodic table's inher- 
ent ability to display trends and correlations. 
The systematic behavior of observable quanti- 
ties along periods or groups is well known, but 
the deviations between DFT predictions and ex- 
perimental values also appear to follow such 



trends (Sec. III). In particular, the largest er- 



rors are restricted to distinct regions. So apart 
from providing a complete test set, the classifi- 
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cation of elements in the periodic table allows 
for an easy visualization and interpretation of 
the data. Furthermore, the elemental materi- 
als are among the best known and most studied 
materials on Earth. Experimental data collec- 
tions are hence rather easy to find, and one can 
assume with sufficient confidence that the re- 
ported data are accurate. 

By means of the ground-state elemental crys- 
tals, the present review offers an overview of the 
power and limitations of solid-state DFT calcu- 
lations. Although this knowledge is often im- 
plicitly available in the computational commu- 
nity, most of it is scattered over two decades of 
literature. The current work therefore summa- 
rizes and quantifies this implicit knowledge into 
a practical protocol, that will allow any scientist 
- experimental or theoretical - to provide jus- 
tifiable error estimates for many basic property 
predictions. Intrinsic errors follow from a sta- 
tistical analysis of the deviations between DFT 
predictions and experimental values for a given 
functional. Subsets of materials for which the 
deviations are particularly large are identified, 
and the reasons for this behavior are discussed. 
Numerical errors, on the other hand, express to 
what extent two independent DFT approaches 
produce identical predictions. We will focus 
on the correspondence between two particular 
methods, APW-l-lo and PAW, by means of rep- 
resentative mainstream codes. For the PAW 
method, the use of different atomic potentials 
can have large effects, but by only consider- 
ing the sets of potentials recommended by each 
code, we hope to establish a general idea of the 
PAW error. 

Instead of performing an extensive level-of- 
theory study for various functionals, we will fo- 
cus on one typical example within the gener- 
alized gradient approximation of DFT (GGA). 
For this, the PBE functional is chosen, because 
it is known to yield good results for solids of 
a wide range of elements and properties 
Moreover, its popularity [41] guarantees the 
comparibility and applicability of our results. 
Other GGA functionals are expected to dis- 
play approximately the same behavior, except 
maybe for very specific material classes. Kurth 
et al. '24' for instance have shown how four 
GGA functionals provide similar trends for both 
the equilibrium volume and the bulk modulus. 
Of course, the determination of the error esti- 
mates is not limited to GGAs. The presented 
methodology is also applicable to other func- 
tionals (such as LDA or hybrid functionals) 
or first-principles approches (such as Hartree- 



Fock, GW or RPA 

This review is organized as follows. Sec. [ll 
describes the computational procedure for all 
properties under consideration and discusses 
the prerequisites for a sound comparison be- 
tween theory and experiment. Within Sec. |III 
the differences between DFT-GGA predictions 
and experimental values are assessed (intrinsic 
errors), whereas Sec. IV focuses on the method 
and code dependence of the theoretically deter- 
mined properties (numerical errors). 



II. PREDICTING EXPERIMENTAL 
PROPERTIES BY MEANS OF DFT 

A. Computational recipes 

DFT computations for five distinct sets of ma- 
terials properties will be discussed. They can 
be divided into energetic (AEcoh) and elastic 



quantities (Vq, Bq, Bi, Cij) 



Of course 



many more properties may be determined by 
means of DFT, but the quantities introduced 
here are directly available from straightforward 
total energy calculations. 

The cohesive energy or atoniization energy 
AEcoh is a popular benchmark quantity [ 19.,, .211 
imiSllMllSnilSS]- Expressed as an energy dif- 
ference per atom, it is defined as 



AEcoh — — {Eq — Eat) 



(1) 



Here Eg represents the energy per atom of the 
compound under investigation in its ground 
state, i.e. at OK and without external stress. 
One can determine it by means of a standard 
pressure optimization procedure, or by fitting a 
few E{V) data points to an empirical equation 
of state (EOS) and extracting the equilibrium 
energy analytically. In this work the latter op- 
tion was chosen, using a common third-order 
Birch-Murnaghan relation [33] : 



E{V) ^Eo + 



9VoBq 
16 



2/3 



Bi 



2/3 



6-4 



2/3' 



(2) 



Vo represents the equilibrium volume, Bq the 
bulk modulus and Bi its pressure derivative. 
Other equations of state exist as well, but no 
significant difference with respect to ground- 
state properties is to be expected. 
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Eat on the other hand is the energy of one iso- 
lated atom in its electronic ground state. Since 
many solid-state DFT codes only allow for the 
use of periodic boundary conditions, the iso- 
lated atom needs to be calculated in a periodic 
unit cell as well. In the present computations a 
free atom is placed in a big orthorhombic unit 
cell, such that every atom is surrounded by at 
least 15 A of vacuum. In this way one can suf- 
ficiently suppress spurious interactions between 
periodic images (< ImeV). The orthorhombic 
symmetry is chosen over e.g. a simpler, cu- 
bic one, to avoid physically incorrect spherical 
states. After all, the use of a unit cell forces the 
electron density to assume the same symmetry 
as the lattice [IS]. This is in most cases only 
possible by means of partial occupation of the 
different electron orbitals, which is not physical. 
Lowering the crystal symmetry counteracts this 
phenomenon and should lead to strictly integer 
occupation numbers. However, some atoms end 
up with partially filled states, even when this 
approach is applied. In such cases, the occupa- 
tion numbers have to be fixed manually before 
looking for the usual, self-consistent solution. In 
this work only experimental ground-state elec- 
tron configurations are used: even when DFT 
predicts a different configuration to be more sta- 
ble [H] (e.g. for W), the experimental occupa- 
tion numbers are taken in order to guarantee a 
meaningful comparison to measurements. Only 
for spin-orbit coupled calculations for the Pb 
atom it is not possible to impose the experi- 
mental electronic state. The PBE ground state 
^Sq is therefore used, and not the experimental 
^Pq state. 

The negative sign in Eq. ([I]) causes positive co- 
hesive energies to correspond to stable phases 
(with respect to atomic decohesion). The other 
sign convention, however, is commonly used as 
well. 

One of the key physical properties of a given 
compound is its volume. In first-principles cal- 
culations the equilibrium volume per atom Vq 
can be obtained easily. One either employs an 
optimization routine or fits some E(V) points 
to an empirical equation of state. This is simi- 
lar to the procedure used to determine Eq, and 
again the latter option is chosen in this work. 
The bulk modulus is closely related to the 
E{V) behavior as well. It is proportional to 
the curvature of the equation of state at the 
equilibrium volume: 



V 



dP 
dV 



V 



d^E 



V=Vo 



It represents the resistance of the unloaded ma- 
terial to volume change, and hence to uniform 
pressure. Because it is linked to the curvature of 
the E{V) relation, Bq is a numerically sensitive 
quantity. A small deviation at a few data points 
is already able to change its value noticeably, es- 
pecially when the bulk modulus is small (shal- 
low EOS). This is increasingly so when only a 
narrow volume range is inspected. 
Bi stands for the derivative of the bulk 
modulus with respect to pressure, evaluated 
at the equilibrium volume: 



Bi 



dB 
dP 



V=Va 



d_ 
dP 



V 



.d^E 



(4) 



V=Vo 



(3) 



V=Vo 



It is a third-order derivative of the energy and 
hence describes effects that are one order higher 
even than the bulk modulus. It is related to 
the volume-dependence of the E{V) curvature. 
Bi is therefore the most sensitive elastic quan- 
tity discussed in this study. Again, both the 
bulk modulus and its pressure derivative are ob- 
tained from fitting an EOS to calculated E{V) 
data points. 

The mechanical behavior of a crystal cannot 
be described solely by means of the bulk mod- 
ulus. When anisotropic deformations are ap- 
plied, other elastic constants come into play 
as well. The full set of these constants makes up 
the stiffness matrix C. It represents a tensor of 
rank 2 and relates (small) cell strains to the cor- 
responding stresses via Hooke's law cr = C ■ e. 
C is a symmetric 6x6 matrix, containing 21 
independent constants at the most. In the case 
of hexagonal crystals five distinct values remain 
(Cii, Ci2, C33, Ci3, and C44), while for cubic 
compounds there are only three (Cn, C12, and 
C44). The Cij parameters can also be trans- 
lated into more general elastic moduli, such as 
Young's modulus E, the shear modulus G and 
Poisson's ratio v. Even the bulk modulus can be 
obtained from a simple combination of the Cij. 
In addition, the elastic constants are known to 
relate to structural stability and various other 
important physical properties [47] I48j. 
Several methods are available to obtain the elas- 
tic constants from first principles, either by re- 
lating energy and strain [49^ or stress and strain 
[501 I5T] . In most cases a stress-based proce- 
dure is preferred, because it is inherently faster. 
However, it requires an ab initio code that can 
determine the stress tensor. In a first step the 
cell pressure components are then extracted for 
a minimal set of deformed geometries. Together 
with the corresponding strains, this results in a 
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system of linear equations. Solving that system 
yields the required elastic constants. When it 
is important to obtain an accurate value of Cij , 
one should construct an overdetermined system, 
by applying the same strain sets at different 
magnitudes. The elastic constants can then be 
retrieved by using a least-squares method. 



B. Comparing theory and experiment 

When a DFT prediction is compared to a num- 
ber from experiment, the corresponding ambi- 
ent conditions should be as identical as possi- 
ble. This means in the first place that the ex- 
perimental result should refer to OK. Moreover, 
the measurement should be corrected for zero- 
point vibrational effects, which are not present 
in standard DFT calculations. The following 
paragraphs discuss how to extrapolate the ex- 
perimental values to absolute zero and correct 
them for zero-point vibrations. 
For the cohesive energy it takes little effort to 
match up theory and experiment consistently. 
Experimental data at low temperatures are in 
most cases available. Only the zero-point en- 
ergy C hinders a direct comparison between K 
and experiment. From quantum mechanics this 
quantity is known to be with (w) the av- 

erage phonon frequency. The latter can be es- 
timated from Debye theory, where it is propor- 
tional to the maximum vibrational frequency, 
and hence to the Debye temperature 6_d. The 
zero-point energy correction becomes |52j 



(5) 



Theoretical cohesive energies can only be com- 
pared to experiment if this contribution is 
added to the experimental values (added, due 
to the chosen sign convention in Eq. ([T)). 
When no experimental value is available, Qd 
can be estimated. Here the Debye- Griineisen 
approximation [SJ 



Od = 0.617— (67r2) 

Kb 



2^ 1/3 ^1/6 



Bo 
M 



1/2 



(6) 



will be used. Both Vq and the mass M are ex- 
pressed per particle, corresponding to a single 
atom for most materials. For dimeric crystals, 
however, the diatomic molecule is chosen as a 
unit of repetition. The regular, room tempera- 
ture experimental values for Bq and Vq are filled 
in, except when the difference with low temper- 
ature results (see further) is significant. This is 
the case for CI, Br, and I. 



Thermal volume corrections consist of two 
parts. Assuming to have a room temperature 
measurement at one's disposal, the first step 
consists in accounting for thermal expansion 
from absolute zero to ambient temperature: 



V 



r 

Jo 



av{T) dT 



(7) 



av{T) represents the temperature-dependent 
volume expansion coefficient. It is zero at OK 
and ay^rt at room temperature (Trt)- Since 
AV^^'' constitutes only a small correction with 
respect to the total volume V, Eq. ([t]) will be 
approximated here as 



V 



Oiv.rt — dT = (8) 







T, 



rt 



In a limited number of cases the experimental 
expansion coefficient is not known. It can then 
be estimated from an empirical correlation to 
the 'moleculization' energy [54]: 



av.rt = 3 X 



.14 • 10~^eV/K/atom 

AEmol 



(9) 



AEmoi is defined as the energy difference per 
atom between the crystalline material and its 
gaslike molecules. For elements with an atomic 
gas phase, it reduces to the atomization energy 
(cohesive energy) . In the absence of experimen- 
tal data on both ay^rt and AEmoh Eq. ([9| is 
completed with DFT values. 
A second modification is again due to zero-point 
effects. Because of the volume-dependence of 
the zero-point energy C, the equilibrium volume 
is shifted slightly. According to Alchagirov et al. 
[Ml [52] , this small difference per atom amounts 
to 



^^(2) ^ (^1 - DC 

2Bo 



9 

16 



Bi 



1)^ (10) 



Dacorogna and Cohen |55| propose an alterna- 
tive definition of the zero-point volume shift. 
They obtain a similar formula, but with Bi 
instead of Si — 1 in Eq. (10). However, the 



mathematical expression is preceded by some 
significant simplifications. When calculating 
zero-point effects it is therefore advisable to use 
Eq. ( [To| instead, especially when Bi is small. 
For the bulk modulus thermal effects should 
be taken into account as well. A first contribu- 
tion originates in the thermal expansion of the 
material. Similar to AV^^\ a correction AB*^^) 
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can be determined too. Roughly approximating 
the relevant behavior, one can write 1561 



(11) 

On the other hand, the effect of zero-point vi- 
brations on the bulk modulus boils down to 1521 



AB^ 
Bo 



Vo 



(Bi 



B,-l 



1 



1) 
1 



-Bi — -BoB-^ 



(12) 



B2 stands for the second-order derivative of the 
bulk modulus with respect to pressure. It is a 
highly sensitive parameter and very difficult to 
extract from a few E{V) data points. In addi- 
tion, B2 is not included in Eq. ^ and a higher- 
order Birch-Murnaghan fit should be applied. 
Instead, the present work will use the intrinsic 
Birch-Murnaghan value: 



{B0B2) 



BM 



Bo 



92 



Qp2 

143 



V 



BM 



V=Vq 



7Bi - Bi 



(13) 



There are other possibilities as well [52], rang- 
ing from a different equation of state to an accu- 
rate numerical determination of B2 ■ In order to 
establish the small correction AB^'^\ however, 
this more consistent approach suffices. 
Since it is already hard to accurately measure a 
high-order parameter like Bi or to determine it 
from first principles, zero temperature modifica- 
tions will often be negligible compared to exper- 
imental or computational errors. Bi is therefore 
not adjusted to incorporate thermal expansion 
or zero-point effects. 

No thermal corrections are applied to the elastic 
constants Cij as well. One can however imagine 
a modification similar to that of Dacorogna and 
Cohen [551 for the bulk modulus: 



AC) 



P ( Ay(™) 



dP 



dP Vo 



(14) 



with m = 1 to account for thermal expansion 
and m = 2 for zero-point effects. Unfortunately 
experimental data about the pressure derivative 
of the elastic constants are scarce. 



III. INTRINSIC ERRORS 

A. Test set preparation 

In order to establish statistically justified intrin- 
sic error estimates, the ground-state elemental 
crystals at K will be used as a benchmark set. 
Pettifor [57] lists these crystal structures, based 
on an overview by Villars and Daams [40] . How- 
ever, in some cases literature suggests another 
phase to be even more stable at low temper- 
atures. In order to ensure the use of OK cell 
geometries as much as possible, such an alter- 
nate structure is taken for boron [583, nitrogen 
[55] . oxygen [5D], and sulfur ^T\. Tab.|l]presents 
an overview of all structures used in the current 
test set. 

Using a OK benchmark set entails two distinct 
advantages. On the one hand some elements 
only crystallize just above absolute zero. Col- 
lecting both 300 K and OK compounds in one 
set might then seem a bit inconsistent. On the 
other hand this approach facilitates the extrap- 
olation from the experimental temperature to 
OK, as there are no phase transformations along 
the way. 

All structures are considered in their stress-free 
ground state. This means that, when the space 
group allows some freedom in the internal posi- 
tions, an optimization with respect to the total 
energy is necessary. This optimization proce- 
dure calls for a fast and well-accepted DFT algo- 
rithm. The projector augmented wave method 
[51[S3] (PAW) as implemented in VASP (SUM] 
(version 5.2.2) fulfills both criteria. The ele- 
mental crystal structures have therefore been 
relaxed by means of this code, using the rec- 
ommended PAW atomic potentials listed in the 
manual |65| . A force convergence criterion 
of 0.01 eV/A was set. All calculations have 
been performed using the tetrahedron method 
with Blochl corrections [6B , while the reciprocal 
space was sampled by means of a Monkhorst- 
Pack grid [ST]- Further computational details 
for the calculations are given in the Supplemen- 
tary Material [BF. 

The equilibrium structure has been obtained 
in two stages. For the determination of the 
equilibrium volume a uniformly spaced 13-point 
EOS (up to Vb ± 6%) has been calculated 
and fitted to a least-squares third order Birch- 
Murnaghan relation (see Sec. II A). Only for a 
number of shallow E{V) curves — in particular 
for H, N, S, the halogens, and the noble gases — 
an increased volume range turned out to be nec- 
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Table I. Ground-state crystal structures for all elements up to radon. Both the space group number and the 
Pearson notation are given (with hRx standing for x atoms in the hexagonal setting of the rhombohedral 
unit cell) 
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essary. For each of the 13 crystal volumes, the 
atomic positions and the cell shape have been 
individually optimized. In a second step, the 
crystal has been reinitialized at the fitted Vq 
and has then been optimized again. 

These optimized crystal structures form the 
definitive test set (submitted to the COD [69] 
and ICSD [TU] crystallographic databases). For 
each of them, most of the properties discussed 
in Sec. Ill Al have been determined in order to 
quantify the difference between PBE and ex- 
perimental values (Sec. |IIIB3| ). The DFT part 
of the comparison has been performed by means 
of VASP, using the settings mentioned earlier. 
They allow to converge all energy differences up 
to a few meV per atom at the most. For O and 
Cr (antiferromagnetic), Mn (ferrimagnetic) , Fe, 
Co, and Ni (ferromagnetic), spin polarization 
has been taken into account, while for the heav- 
iest elements (as from Lu) spin-orbit contribu- 
tions have been incorporated. At that point 
relativistic effects beyond the scalar-relativistic 
approach become important, as will be shown 



later (Tab. VII I 



The analysis in Sec. Ill B 3 will not show the raw 
calculated data, but will rather elaborate on the 
deviation between theory and experiment. The 
first-principles results and the thermally cor- 
rected experimental numbers [371140114311^11711 - 
I103j have been included in the Supplementary 
Material [55]. A tabulation of calculated and 
experimental values for the elastic constants 
Cij was published before by Shang et al. [37] 
for most of the present benchmark set. In 



Sec. |III B 3| their data are used. Only for the 
experimental numbers of Ba we found more re- 
alistic results elsewhere [96l [97] . Since the au- 
thors only considered bcc, fee and hep struc- 
tures, this implies that for Li and Na a different 
geometry was applied than in the rest of this 
work (bcc instead of hR9). Moreover their re- 
sults are based on a PW91 functional [1041 1105] . 
rather than the PBE approximation employed 
in the rest of this work. Although these GGA 
approaches yield different results in a few situ- 
ations [106', they are in most cases very similar 
and for the elastic constants no significant de- 
viation is expected. 



B. Statistical analysis 

1. Linear regression 

Benchmark studies usually analyze the differ- 
ence between DFT and experiment statistically. 
The most common characteristics investigated 
are the mean error (signed) and the mean abso- 
lute error (unsigned). However, this approach 
implicitly assumes that the offset between DFT 
predictions and experimental results is the same 
for large and small values. For strictly positive 
quantities a relative shift seems more reason- 
able. The present analysis therefore explicitly 
treats relative deviations, in addition to the re- 
maining scatter on that trend. This is done by 
means of a linear regression between DFT data 
and experimental results. 
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Calculated property 



Figure 1. (Color online) The intrinsic error between 
experiment and DFT can be decomposed in a sys- 
tematic deviation and a residual error bar. Sys- 
tematic differences are quantified by comparing the 
linear regression line X — PT (red, dashed) to the 
bisector X — T (black, full). The residual error bar 
(green arrows) is defined as the standard deviation 
of the regression errors (SER) 



The linear regression is performed by means of 
a least-squares fit, from which we obtain the 
slope as weU as the scatter with respect to the 
regression line [107] . The model hence presumes 
that a perfect correlation between experiment 
X and theory T exists, distorted by a random 
error e centered around a zero mean: X = /3T + 
e. If the exact exchange-correlation functional 
were known, it would approximately lead to e = 
and (3 — 1. In practice, comparing the least- 
squares estimate of /3 to 1 offers a good measure 
of any systematic deviations, while the standard 
deviation of e (also denoted as standard error 
of the regression or SER) expresses the residual 
error bar (see Fig. [T]) . 

Although the experimental community com- 
monly employs the nomenclature 'systematic 
errors' and 'non-systematic errors', we will 
avoid using the second term. Non-systematic 
errors suggest a certain degree of randomness, 
which is not present in DFT. When one per- 
forms the same experiment several times, the re- 
sults are spread around a mean value. In DFT, 
repeating the same calculation always yields 
identical results. In contrast to experimen- 
tal error analysis, the spread of deviations in 
DFT results becomes apparent only when pre- 
dictions for many different compounds are com- 
pared to experiment, i.e. when a benchmark 
set is used. The DFT scatter is then caused 
by some (subsets of) crystals being described 
better by the functional than others. Intrin- 



sic error bars are hence fundamentally different 
from experimental error bars as well. The sys- 
tematicness of DFT also appears when study- 
ing results within a certain family of materials: 
similar systems behave almost identically, prov- 
ing the DFT spread not to behave randomly at 
all. An excellent example is found in literature, 
where the correspondence with experiment dis- 
plays much less scatter when chemically similar 
compounds are involved 

As an additional side note, one should be aware 
that experimental error bars have not been 
taken into account in any way. In fact, the 
statistical model should he X + r] = /3T + e, 
where 77 represents an additional (but uncorre- 
lated) zero-mean perturbation. This does not 
only affect the comparison between individual 
DFT and experimental results, but also influ- 
ences the values of the intrinsic systematic devi- 
ations and residual error bars that are presented 
in this work. By considering a test set that is 
sufficiently large, such as in the present study, 
one may hope that these effects level out. In ad- 
dition, the elemental crystals belong to the most 
intensively studied materials, such that the ex- 
perimental errors are much smaller than for reg- 
ular compounds. Without full knowledge of the 
experimental errors, however, we can only state 
that the SER provides an upper limit for the 
real PBE spread a^. A possible solution consists 
in comparing DFT values to results from highly 
accurate many-body techniques instead. Such 
high-precision data are not available for many 
of the materials considered here, however, and 
to calculate them ourselves exceeds the scope of 
the current review. 



2. Eliminating outliers 

A full statistical analysis of all elemental data 
cannot be performed straightforwardly. Some 
subsets of elements strongly distort the agree- 
ment between DFT and experiment. More 
meaningful error estimates are obtained when 
the most striking outliers are removed from the 
data set. Since the deviating behavior is often 
caused by a bad description of some underly- 
ing physical mechanism, most of them will be 
grouped in subsets of similar compounds. In- 
stead of removing one outlier at a time from the 
data set, we choose only to remove entire struc- 
ture types at once. This way, individual mate- 
rials that behave well for the wrong reasons, are 
excluded as well, avoiding bias towards smaller 
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Figure 2. (Color online) Decomposition of the pe- 
riodic table into smaller subsets of elements, based 
on common physical properties of the corresponding 
ground-state crystals: (1) alkali and alkaline earth 
metals, (2) nonmagnetic transition metals, (3) mag- 
netic materials, (4) correlation-dominated materi- 
als, (5) high-coordination p block compounds, (6) 
low-coordination p block compounds, (7) molecular 
crystals, and (8) noble gases. Subsets 7 and 8 cor- 
respond to materials where dispersion interactions 
are essential 



errors. 

A decomposition of the test set into eight sub- 
sets is proposed, based on some common phys- 
ical properties of the corresponding elemental 
crystals (Fig. [2]). They are: (1) alkali and al- 
kaline earth metals, (2) nonmagnetic transition 
metals, (3) magnetic materials, (4) correlation- 
dominated materials, (5) high-coordination p 
block compounds, (6) low-coordination p block 
compounds, (7) molecular crystals, and (8) no- 
ble gases. Obviously for some boundary ele- 
ments, the most appropriate subset can be mat- 
ter for discussion, but the classification in Fig. [2] 
explains most trends for the intrinsic errors in 
a satisfactory manner (see further). 
These eight subsets of elemental materials are 
representative for more complex (multicompo- 
nent) crystals as well. They provide prototype 
systems for particular bond types and physical 
phenomena, such as London dispersion (subsets 
7 and 8), magnetism (subset 3) and electronic 
correlation (subset 4). Observations of DFT 
performance for these eight subsets will there- 
fore carry over to multicomponent compounds. 
In order to eliminate deviating subsets in an 
objective way, the following procedure has been 
used. All subsets from Fig. [2] that have half 
or more of their elements differing significantly 
from the dominating trend, have been excluded. 
A two-sided p- value of 10% is maintained. In 
other words, a data point is considered to de- 
viate substantially when the (signed) relative 



residual error (expressing the difference between 
the DFT regression value and the experimen- 
tal one) belongs to the outer 10 % of a nor- 
mal distribution. This approach has been re- 
peated in an iterative way: after the elimina- 
tion of each subset the significance criterion has 
been reestablished, until no deviating subsets 
remained. For solids belonging to an excluded 
category, PBE is not expected to provide reli- 
able property predictions. 

This selection criterion has been visualized in 
Tabs. [IT] to |III[ For each elemental crystal 
the relative residual error \xexp — Xreg\/ Xexp is 
shown. Large numbers suggest a significant de- 
viation from the regression line and hence allow 
to recognize outliers. Because these differences 
are displayed in the shape of the periodic ta- 
ble, they allow for easy identification of deviat- 
ing subset^ A color code has been added to 
improve intuition, with the darkest shades cor- 
responding to the largest deviations. The devi- 
ations with respect to the elastic constants rep- 
resent the mean absolute errors over Cn, C12, 
C33, C'ls, and C44. 

Another way to visualize the assessment pro- 
cedure is presented in Fig. [Sj In order to get 
a better view on systematic deviations, both 
the linear regression line (dashed red) and the 
first quadrant bisector (full black, represent- 
ing a perfect match between theory and exper- 
iment) have been added for all accepted ele- 
ments (see Tab. IV). As an additional quality 
indicator, the Pearson product-moment corre- 
lation coefficient r has been included for each 
property [lllj . with r = 1 indicating a perfect 
positive correlation. Data points correspond- 
ing to omitted subsets, on the other hand, have 
been represented by an open symbol. 



3. Predicting experiment 

Using all crystals that survived this selection 
procedure (filled symbols in Fig. [s]), a least- 
squares linear regression can now be computed 



^ The graphical representations in this work employ the 
conventional, medium-form periodic table. Hydrogen 
is kept in group lA (contrary to the vivid discussion 
in e.g. Ref. 108) and lutetium in group IIIB 109 . 
This increases the intuitive character of the results. 
Insights should be conveyed at a glance and many 
researchers are most familiar with the standard format 
of the periodic table. For the same reason the two- 
dimensional representation is preferred over some ID 
alternatives |110j . 
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Table II. (Color online) Relative residual errors (of the VASP-PBE regression results with respect to the 
thermally corrected experimental values) for AEcoh 43", '71] [72] (green), Vo [40] [73 -80 (red) and Bo [43] 
[551 [5TH55| (blue) of the elemental crystals. The darkest shades correspond to the largest errors 
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Table III. (Color online) Relative residual errors (of the VASP-PBE / -PW91 regression results with respect 
to the thermally corrected experimental values) for Bi [89H95] (purple) and dj [33|96l|97] (PW91, cyan) 
of the elemental crystals. The darkest shades correspond to the largest errors 
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for all properties from Sec. |II A| Tab. |IV] sum- 
marizes the resulting intrinsic errors in terms 
of relative systematic deviations {1 — /3) and 
residual error bars (SER) . Systematic errors are 
mentioned for PBE with respect to experiment, 
so a positive number implies PBE to overes- 
timate that property. Each percentage is ex- 
pressed relative to the PBE result, allowing for 
a straightforward calculation of the regression 
value (xreg — Xth ~ (1 ~ P)xth)- Between brack- 
ets the significance level of /3 ^ 1 is mentioned. 
It represents the two-sided p-value when a null 
hypothesis of /3 = 1 is assumed: if there really 
were no deviation at all, a small p-value would 
indicate that finding an even more extreme re- 
sult would be highly unlikely. For the residual 
error bars a 95 % confidence interval is given. 



The last column lists the subsets of elements 
that were excluded from the regression analy- 
sis by the selection procedure described before. 
For this the naming convention from Fig. [2] has 
been used. 

This statistical treatment makes several im- 
plicit assumptions. One of the most important 
premises is the use of a relative error over a 
constant offset. After all, for strictly positive 
quantities, such as Vq or T^j an invariable shift 
seems counterintuitive. The impact of such an 
error is much larger when the investigated prop- 
erty is small. In addition, when using rela- 
tive systematic deviations, the difference from 
/3 = 1 indeed matters. For the equilibrium 
volume (p = 6 • 10~^^) and the bulk modulus 
(p = 5 • 10^**) the deviation from the bisector 
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Figure 3. (Color online) Linear regression (dashed red) between the (thermally corrected) experimental 
and theoretical results (VASP-PBE / -PW91, see text) for the cohesive energy |43II72| . equilibrium volume 
[401 [73^80] . bulk modulus [43l ESI ISTtiSH] . pressure derivative of the bulk modulus [ISHSS], and elastic 
constants [371 1961 The full, black line stands for x^xp ~ xth- r represents the Pearson product-moment 
correlation coefficient for all elements included in the regression (filled symbols). The criterion for excluding 
certain elements from the fit (open symbols) is discussed in the text 
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Table IV. Systematic deviations 1 — /9 and intrinsic 
error bars (SER) for the VASP- PBE/-P W91 (see 
text) properties presented in Tabs. |II|III compared 
with experiment. The significance of the system- 
atic deviation from Xexp ~ Xth is indicated between 
brackets, by means of the two-sided p- value (low p- 
values for high significance). For the standard error 
of the regression a 95 % confidence interval is given 
in terms of the upper and lower limit (superscript 
and subscript). Subsets containing a lot of outliers 
have been excluded from the data set by means of 
the procedure mentioned in the text (notation from 
Fig.[2f 
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Xexp = Xth is clearly significant. For other prop- 
erties this is not always that obvious, but due 
to physical connections with Vq and Bq, it is 
relevant to consider systematic deviations there 
as well. 

A second remark concerns the nature of the 
intrinsic residual error bars. The numbers in 



Tab. IV were computed assuming a normal dis- 
tribution for the random error e. This results in 
an absolute residual error bar. Indeed, a Pear- 
son's x^-test does not contradict the apphca- 
bility of a normal distribution to the intrinsic 
random errors. A null hypothesis assuming a 
Gaussian distribution around zero yields a 1- 
sided p- value of 0.21 for the volume. Within 
the DFT community, however, relative error 
bars are often implicity assumed. As a conse- 
quence, small volumes are expected to be pre- 
dicted much more accurately, for example. Ac- 
cording to our analysis, the matter appears to 
be not that simple, as is shown in Fig. |4] In 
Fig. |4(a)[ the relative residual errors are plot- 
ted against the volume obtained by the least- 
squares fit. The overall decreasing trend sug- 
gests that the DFT errors are bes t desc ribed in 
terms of absolute error bars. Fig. |4(b)] displays 
the absolute residual errors instead. A rough 
linear correlation emerges, that implies that on 
the contrary a relative error bar is more appro- 
priate. Both conclusions are compatible, by as- 
suming a relative residual error bar of about 3 % 
for small to median volumes (< 50 A'^ /atom) 
and an absolute residual error bar of approx- 



imately 1.5 A"^ /atom for larger volumes (with 
the exception of a few outlierf]^. The number 
of data points is not sufficiently large for this 
finding to be really convincing, however. We 
therefore prefer to adopt the over all absolute er- 
ror bar of 1.1 A'^/atom (Tab. IV) as being valid 



for all volumes. Admittedly, for most small vol- 
umes the intrinsic residual error bars are then 
overestimated. 

For any given material, the information in 
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Figure 4. Relative (a) and absolute (b) differences 
of the DFT regression data (DFT*) with respect to 
thermally corrected experimental values [401 l73H80j 
for the equilibrium volume 



5^ 



^ The fitting error is primarily correlated to the bond 
type. Although badly performing subsets of mate- 
rials have already been excluded from the test set, 
some crystal types are described significantly better 
by DFT than others, especially those with strong 
bonds (covalent p-bonds, half-filled d-shell elements). 
Their strong bonds also lead to more compact struc- 
tures, which explains the smaller errors for smaller 
volumes (Fig. |4(b)| . Other subsets of crystals with 
similar volumes perform worse, however, and give rise 
to the larger relative errors in Fig. |4(a)[ 
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Table V. Demonstration of the use of intrinsic errors (Tab. IV I to improve DFT-PBE predictions and assess 
their rehabihty 





Vo(W) [AVatom] 


Bo (diamond) [GPa] 


V(,(GaAs) [AVatom] 


PBE (bare) 


16.28 


434.8 


23.73 EH 


systematic deviation 


15.69 (-3.6%) 


456.0 (+4.9%) 


22.87 (-3.6%) 


zero-point correction 


15.71 (+0.02) 


438.4 (-17.6) 


23.00 (+0.13) [Ml 


residual error bar 


15.7 ± 1.1 


438 ± 15 


23.0 ± 1.1 


experiment (OK) 


15.8 (34) 


443 [43, 


22.5 t34j 



Tab. IIVI can now be used to determine a mean- 
ingful estimate of a certain property. As an 
example we consider the bulk modulus of dia- 
mond. This material is not included in the test 
set (the ground-state crystal structure of car- 
bon is graphite), but it can be assigned to sub- 
set 5, the high-coordination p-block compounds 
(similar to Si, Ge, and Sn). Subset 5 does not 
belong to one of the excluded subsets for Bq, 
and the intrinsic error for the bulk modulus 
prediction of diamond should therefore be rep- 
resentative. A bare VASP-PBE computation 
yields 434.8 GPa. When taking into account 
that PBE bulk moduli are systematically too 
small by 4.9 % (Tab. IV I, this value increases to 
456.0GPa. By means of Eqs. ([l2]), ([lO|, and ^ 
zero-point corrections (—17.6 GPa) are added 
back in, yielding 438.4 GPa. Using the ap- 
propriate intrinsic residual error bar of 15 GPa 



(Tab.|IV|), the final result becomes 438±15 GPa. 
This is the most accurate DFT-PBE prediction 
of the experimental bulk modulus at OK, in- 
cluding an error bar on the computed value. 
In comparison, the experimental value extrap- 
olated to absolute zero (Eqs. (11) and (|8|) 
amounts to 443 GPa [43] . This number remains 
neatly within the error bar and is indeed closer 
to the regression-corrected bulk modulus than 
to the bare DFT value. A similar procedure can 
be used for all properties in Tab. |IV[ Tab. |V] of- 
fers a few more examples. 

Tab. |IV| is based on elemental solids only. One 
has to verify that the results from this statisti- 
cal analysis are transferable to multicomponent 
materials. A good test case in that respect is the 
collection of thirty-one binary compounds for 
which Haas et al. [M! 'SS] calculated lattice pa- 
rameters by means of PBE. When we take their 
DFT results, the experimental volume falls out- 
side the confidence interval for seven crystals. 
In all of these cases, the PBE volume is too 
large. By taking into account the systematic 
overestimation of 3.6%, however, only in two 
cases the experimental value exceeds the pre- 



dicted range. Both of these compounds are ion- 
ically bound (NaF and NaCl), a bond type that 
has not been considered in any of the proposed 
subsets (Fig. [2| . Although the multicomponent 
test set may therefore not contain all materials 
types and although it only relates to the atomic 
volumes, these results already strongly support 
the transferability of our error estimates. 
For one example from Haas et al., GaAs, the in- 
trinsic error contributions are listed in Tab. 
It illustrates that the systematic deviation re- 
ally matters if it is the goal to get as close as 
possible to the experimental value. This also 
appears from the mean absolute difference be- 
tween experiment and the 31 theoretical pre- 
dictions by Haas et al.. When one does not 
apply the relative deviation from Tab. |IV[ this 
number amounts to 0.72 cry (0.78 A'^/atom), 
while for the regression values it is only 0.37 cry 
(G.4GA3/atom). 



C. Agreement with experiment 



Errors per materials type 



Because Tabs. Tl |lII| are shaped like the periodic 
table, the color code immediately allows to sin- 
gle out the areas where PBE breaks down. It 
leads to a number of subsets (Fig. [5]) which can 
be eliminated from the test set. They are listed 
in Tab. |IV| For these elements, PBE performs 
significantly worse than usual, mostly because 
some key physical phenomenon is not described 
(well) by the functional. In this subsection these 
localized error zones are discussed in more de- 
tail, as well as the mechanisms on which they 
are based. 

A first, well-known example of the failure 
of PBE is the class of dispersion-governed 
compounds. Although some more advanced 
DFT approaches address this issue specifically 
|112|, 1113) , regular GGA functionals do not de- 
scribe London forces. This translates into a 
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Table VI. Residual errors (of the VASP-PBE regres- 
sion results with respect to the zero-kelvin extrap- 
olated experimental values gOl [H [HI EH E]) for 
two allotropes of carbon 





A£coh [kJ/mol] 


14 [A Vat] 


Bo [GPa] 


graphite 


39 (5%) 


3.2 (39%) 


54 (97%) 


diamond 


13 (2%) 


0.1 (2%) 


5 (1%) 



decreased cohesion, and hence an inflated vol- 
ume and underestimated bulk modulus (see also 
Sec. |IIlC2 ). 

Although London forces have been demon- 
strated to play a role in other structures as well 
|114H116] , the most important crystals that suf- 
fer from this shortcoming, belong to the non- 
metals (subsets 7 and 8). They include the no- 
ble gases, the dimeric crystals, graphite, and 
sulfur. In these materials the London dispersion 
interaction governs the bonding between atoms, 
diatomic molecules, graphene sheets, and 8- 
membered rings, respectively. Nevertheless it is 
essential to realize that both the element type 
and the crystal structure contribute to the im- 
portance of dispersion. It is perfectly plausible 
that a certain element behaves badly in struc- 
ture A, while there are no problems when it as- 
sumes structure B. This can be illustrated nicely 
by means of carbon. The dispersion forces be- 
tween the graphene layers in graphite give rise 
to a large discrepancy between DFT and exper- 
iment. Diamond on the other hand follows the 
same behavior as neighboring (semi)metallic el- 
ements or even outperforms them (Tab. VI). 



For the molecular crystals (subset 7) the PBE 
cohesive energy is larger than the experimental 
value [68j . contrary to the expectation. This is 
due to the overestimation of the intramolecular 
bond strength (see e.g. Lany [33] and Tab. 11 of 
Paier et al. |19j). which covers up any influence 
of the lack of dispersion. Elastic properties on 
the other hand are in most cases not affected 
by intramolecular effects and show a similar be- 
havior as for the remaining nonmetals. 
The magnetic materials (subset 3) stand out 
as well, predominantly with respect to AEcoh- 
Although the use of the generalized gradient 
approximation and a correct atomic reference 
state have already reduced the gap between the- 
ory and experiment substantially [45], the re- 
maining difference cannot be neglected. Cur- 
rent GGA functionals are not able to describe 
magnetic compounds very well. Manganese 
illustrates this nicely. Its intricate magnetic 



state |117) has been approximated by assum- 
ing only collinear magnetism, but this does not 
explain the observed differences. The cohesive 
energy, for example, would be higher in its cor- 
rect ground state, leading to an even more pro- 
nounced deviation from experiment. An expla- 
nation is found with Singh and Ashkenazi [118] , 
who noticed that GGAs overestimate the mag- 
netic energy. This is caused by the increased 
number of degrees of freedom in spin-polarized 
systems (two spins), while the number of phys- 
ical relations the GGA must fulfill stays the 
same. 

The discrepancies between theory and exper- 
iment are not caused by the DFT functional 
alone, however. For some magnetic elements 
the applied thermal extrapolations are no longer 
valid, because of phase transition effects in the 
vicinity of the Curie or Neel temperature. Ex- 
perimental chromium is a good example, dis- 
playing large magnetic distortions of the ther- 
mal expansion coefficient near 311 K |119| . A 
relation as simple as that of Ec^. ( 8]) cannot cap- 
ture these complex underlying phenomena. 
The transition metals with (nearly) full d shells 
sometimes deviate from experiment as well 
(subset 4). The effects are smaller than for the 
previous two classes of materials, but they are 
unmistakably present, especially in terms of the 
cohesive energy and the elastic constants. One 
can attribute this phenomenon to electronic 
correlation. For Zn, Cd, and Hg a full-fledged 
many-body treatment has indeed convincingly 
shown the influence of d electron correlation 
on AEcoh and the potential energy landscape 
[m 11201 Hn]. Data in Tabs, [njjil!] even im- 
ply that similar (but smaller) effects show up in 
other elements at the end of the d block, such 
as in Pd, Ag, Pt, and Au. In noble metals, dis- 
persion phenomena play an important role too 
|115j . however, and it is not immediately clear 
how much of the remaining discrepancy can be 
attributed to electronic correlation. Since the 
influence of correlation in these elements ap- 
pears limited, only Cd an Hg have been assigned 
to subset 4. 

It seems that at the end of the d block, the high 
number of localized d electron pairs in combi- 
nation with a small interatomic distance and a 
close-packed environment enhances correlation 
effects. Moreover, Philipsen and Baerends [45] 
suggest that at the very beginning and the very 
end of the 3d transition metals the GGA ex- 
change energy drops, causing the electron corre- 
lation to gain importance. Any fortuitous can- 
cellation of errors between exchange and corre- 
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lation has therefore disappeared for the transi- 
tion metals at the border of the d bfock. 
These correlation effects appear to be of a 
mainly anisotropic nature, since all elastic con- 
stants Cij are affected, but the deviations of Vq 
and Bo are less pronounced. This is also sug- 
gested by Wedig et al. |121j for Zn and Cd, 
where a different interlayer and intralayer be- 
havior is observed. 

Relativistic effects are expected to strongly 
influence heavy elements. VASP therefore 
makes use of the scalar-relativistic Koeller- 
Harmon approach by default |122j . The major 
remaining contribution is due to spin-orbit cou- 
pling. However, it is shown by Philipsen and 
Baerends [123^ that this does not change phys- 
ical properties substantially. Only for gold and 
bismuth a distinct change is reported, but with- 
out closing the gap between theory and experi- 
ment entirely. The remaining difference for Au 
is primarily due to correlation and dispersion 
effects, as was already suggested above. For 
the 6p elements on the other hand spin-orbit 
coupling really plays an important role. Some 
key properties for the 5d and 6p compounds 
have been calculated, both with and without 



spin-orbit coupling (Tab. VII I. It is immediately 
clear that, starting from the end of the 5d block, 
a spin-orbit treatment becomes indispensable. 
Hence, for all 5d and 6p elements this contribu- 
tion has been included, except for Cij [37] . 



2. Errors per property 

The previous section shows that FEE is not 
'complete': some features just cannot be de- 
scribed by a simple GGA functional. One can 
exclude the affected materials (outliers) before- 
hand, however, and limit the analysis to those 
cases where FEE should perform well. The in- 



trinsic errors from Tab. IV are applicable to 
these crystals. Tab. |IV| then shows that the 
FEE error estimates largely depend on what 
property is considered. The behavior of the 
residual error bar and the systematic deviation 
from experiment can be traced back to both 
the functional and the numerical determination 
of that particular property. Nevertheless, it 
is important to note that, although the over- 
all error estimate can be linked to theoretical 
aspects, the correspondence to experiment for 
a single compound depends on the experimen- 
tal accuracy as well. This is especially true 
for higher-order properties (such as the elas- 



tic constants and their derivatives), which are 
generally measured at a lower precision than 
those from (quasi) direct measurements (the lat- 
tice constants or the cohesive energy, for exam- 
ple). This is illustrated by the sometimes large 
spread on the data in Knittle's overview of Bi 
values 89J. 

From a computational viewpoint, however, the 
equilibrium volumes offer the best results 
among all considered quantities. After elimi- 
nating the outliers (listed in Tab. |IV| and rep- 
resented in Fig. |3(b)| by open symbols) , an al- 
most perfect correlation is obtained. Even so, 
the regression line does not coincide with the 
first quadrant bisector. Tab. IV shows that the 



cell volumes are consistently too large by ap- 
proximately 4 %. This deviation is a well-known 
property of any GGA [124J, including FEE. 
It originates in a systematic underestimation 
of the bond strength (underbinding), resulting 
in slightly larger volumes. More particularly, 
GGAs favor inhomogeneous systems, with large 
(reduced) density gradients. Small unit cells, 
which have a more evenly distributed electron 
density, are therefore energetically less prefer- 
able. This phenomenon especially affects open 
structures, where the high-gradient tails of the 



Table VII. Relative residual errors (of the VASP- 
PBE regression results with respect to the zero- 
kelvin extrapolated experimental values [40II43|[80] ) 
for Ag and the 5d and 6p materials, both with (SO) 
and without spin-orbit coupling (n-SO) 





AE 


coh 


Vo 


B 







n-SO 


SO 


n-SO 


SO 


n-SO 


SO 


Ag 


16% 


16% 


3% 


3% 


11% 


11% 


Lu 


7% 


8% 


3% 


3% 


18% 


18% 


Hf 


1% 


3% 


3% 


3% 


2% 


2% 


Ta 


1% 


2% 


2% 


2% 


1% 


0% 


W 


0% 


1% 


1% 


1% 


3% 


5% 


Re 


3% 


2% 


2% 


1% 


3% 


1% 


Os 


1% 


3% 


0% 


1% 


1% 


4% 


Ir 


4% 


0% 


0% 


0% 


0% 


2% 


Pt 


7% 


9% 


1% 


1% 


9% 


11% 


Au 


22% 


19% 


4% 


3% 


21% 


19% 


Hg 


75% 


69% 


29% 


22% 






Tl 


6% 


20% 


9% 


7% 


27% 


25% 


Pb 


44% 


4% 


4% 


4% 


10% 


18% 


Bi 


15% 


5% 


2% 


5% 


17% 


26% 


Po 


59% 


8% 


2% 


3% 


73% 


30% 


Rn 


83% 


81% 
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valence electron orbitals become non-negligible 

Immediately linked to this observation is the 
underestimation of the bulk modulus. More 
weakly bound structures will be more easily 
compressible, leading to smaller Bq values. Just 
like the too large predicted volumes, it is com- 
mon behavior for GGA functionals |124j . On 
the other hand, PBE bulk moduli are predicted 
with a larger uncertainty than the volumes. The 
intrinsic residual error remains in most cases be- 
low 10 to 15%, however (Tab. [lT|. The magni- 
tude of this difference is mostly due to the sen- 
sitivity of the E{V) curvature. 
Since Bq and the other elastic constants are 
closely related, the intrinsic errors with respect 
to the C'ij parameters are of a comparable scale. 
The bulk moduli are larger on average, which 
leads to slightly smaller relative errors (Tabs.jn] 
and III). However, a good correlation is found 
in both cases, with a similar value of r for the 



elastic constants (Fig. |3(e) ) and the bulk moduli 

(Fig.ra. 



For the cohesive energy PBE yields very 
good results as well. The intrinsic error bar of 
30 kJ / mol is of the same order of magnitude as 
the rms error found by Lany 33 for PBE heats 
of formation of semiconductors and insulators 
(0.24eV/atom). It is therefore representative 
for PBE energy differences between chemically 
different compounds. As mentioned before, for 
similar systems the intrinsic residual error bar 
is much smaller [39j . This also explains the suc- 
cess of evolutionary algorithms. They are based 
on energy differences of the order of a few meV 
per atom |1251I129] . but some results have al- 
ready been confirmed experimentally neverthe- 
less 0. 

Contrary to Vq and Bq there is now no system- 
atic under- or overestimation compared to ex- 
periment. The typical underbinding of GGA 
does not show as conclusively in AEcoh- This is 
due to the magnetic materials and the molecular 
crystals. As mentioned before, GGA function- 
als bias solutions towards magnetism for the for- 
mer and overestimate the intramolecular contri- 
bution in the latter. In both cases this causes 
the cohesive energy to oppose the dominating 
trend. Without the crystals from subsets 3 and 
7 in the test set, the cohesive energies would 
have been underestimated by 2% instead (p- 
value of 0.008). This behavior is in accordance 
with the expected underbinding of GGA. 
Since the bulk modulus derivative Bi is a 
higher-order parameter than Bq, the errors are 
expected to be one order worse as well. Al- 



though this is certainly the case, eliminating 
the outliers substantially improves the results 
(Fig. [3(d)] ). However, even when they are re- 
moved, the resulting correlation coefficient (r = 
0.849) remains significantly lower than for any 
other property already discussed. 
Bi appears to be overestimated with respect to 
experiment. This systematic deviation is sig- 
nificant, although the p- value may not show it 
conclusively (Tab. IV). It is again caused by 
GGA underbinding. As mentioned before, large 
volumes are favored due to their substantial 
density gradients. GGA hence lowers the en- 
ergies of bigger cells most and straightens out 
the equation of state. This causes the E{V) 
line to alter its decreasing curvature even more 
rapidly, increasing the rate of change of the bulk 
modulus with pressure (and volume), Bi. It 
also explains the deviating behavior of crystals 
with a low coordination, such as the molecular 
crystals. In these compounds the tails of the 
electron wave functions dominate the intersti- 
tial space, leading to considerable density gra- 
dients. The increase of the sensitive parameter 
Bi is then enhanced even further. 



IV. NUMERICAL ERRORS 

The previous section describes the intrinsic 
PBE errors for five different properties, based 
on a statistical treatment. They are used in a 
protocol which allows experimentalists and the- 
oreticians to correct the bare DFT-PBE values 
for the observed systematic deviation from ex- 
periment and which quantifies the uncertainty 
on the obtained predictions (Tab. [V]). A pre- 
requisite for such a protocol is that different 
DFT implementations provide the same pre- 
dictions: using different algorithms to solve 
the same (Kohn-Sham) equations should ideally 
lead to identical solutions. In practice, differ- 
ent amounts of noise are inevitably introduced 
in the predictions, even when numerical con- 
vergence has been achieved for each individual 
code. This scatter is due to several aspects of 
the solution algorithm. It can be due its nature 
(e.g. the kind of basis set or the frozen-core ap- 
proximation), its specific ingredients (e.g. the 
chosen pseudopotential) or its use of particular 
routines for standard tasks (e.g. Fourier trans- 
form routines). In order to guarantee the re- 



producibility of the intrinsic errors in Tab. IV 



one needs to examine to what extent these is- 
sues affect the DFT computations. Once again. 
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a reliable benchmark can be established using 
the ground-state elemental crystals. 
The current section describes a procedure to ex- 
press the difference between predictions from 
independent solid-state DFT approaches in a 
quantitative way, yielding a numerical error es- 
timate. It will be used to examine differences 
between the PAW and APW-l-lo method, rep- 
resenting both methods by suitable mainstream 
codes (see further). However, the difference be- 
tween codes can be attributed to other aspects 
as well, as was mentioned earlier. The influence 
of standard task routines is most likely small, 
but we will also use two PAW codes with dif- 
ferent PAW atomic potentials, which can have 
quite drastic effects on the DFT results. All 
codes are therefore considered here with their 
recommended potentials. These can be thought 
of as representative for the quality of the investi- 
gated code. Although it is not our primary goal 
here, the procedure that will be described can 
also be used to select better performing PAW 
atomic potentials. The same holds for pseu- 
dopotentials in the case of plane-wave codes. 



A. Test set preparation 

The present implementation assessment starts 
from one reference code, the all-electron pro- 
gram WIEN2k [130] (version 11.1). It uses the 
APW-f-lo basis set [HTl \TT2\ . which is consid- 
ered to be a standard for the numerical accu- 
racy of sohd-state DFT. WIEN2k predictions 
can therefore be considered to yield the exact 
results for a given functional, as long as numer- 
ical accuracy is achieved [M] (large basis set and 
dense fc-mesh, see Supplementary Material for 
more details [68] )■ Two codes are compared to 
this reference code: VASP [51|M] (version 5.2.2) 
and GPAW [mHraS] (version 0.8.0), both us- 
ing the PAW method GPAW calculates 
all wave functions, densities and potentials as 
grid-based quantities, while VASP uses a plane- 
wave basis set. All calculations with these pro- 
grams are performed by means of the poten- 
tials recommended by the respective developers: 
the 2010 recommended PAW potentials [BH] for 
VASP and the 0.6 atomic set-ups for GPAW. 
Detailed computational parameters are summa- 
rized in the Supplementary Material [55] . 
For reasons of uniformity and comparability the 
same PBE functional has been selected for all 
three codes. It is used in a protocol that seeks to 
evaluate a particular DFT approach in an eas- 



ily reproducible manner. The VASP-optimized 
ground-state crystal (Sec. Ill A) serves as a 



starting point for each computation and from 
a 7-point equation of state (0.94 Vq to 1.06 Vo) 
the properties of interest (Eq, Vq, Bq, and Bi) 
are extracted. All geometries are kept frozen 
(the cell shape and relative atomic positions are 
kept fixed at their initial values) , instead of al- 
lowing for relaxation changes. This not only 
lowers the computational load, it also restricts 
the code evaluation to the implementation of 
DFT-PBE itself. Indeed, the task of optimiz- 
ing the cell shape or internal positions belongs 
to another computational layer, on top of the 
task of solving the DFT equations for a given 
rigid geometry. This section aims to examine 
how different implementations compare with re- 
spect to the DFT-PBE procedure only. It does 
not intend to study how close every individual 
approach comes to experiment. 
In the same spirit some other modifications of 
the hitherto employed test set have been made. 
All calculations have been limited to the scalar- 
relativistic part (using the Koelling-Harmon ap- 
proach [122| ). By neglecting the spin-orbit con- 
tribution, an additional secondary algorithm 
implementation is avoided. The computational 
procedure also becomes more uniform this way, 
since all elements are now treated on equal 
terms. Because no spin-orbit coupling is added 
to the system's Hamiltonian, it suffices to use 
non-spin-orbit geometries as a starting point for 
the 7-point equation of state. 
A simplified unit cell has been selected for Mn 
and S as an additional means of lowering the 
computational effort. Manganese is treated in 
an antiferromagnetic fee phase (space group 
225, cF4), while for sulfur the /3 Po phase is 
imposed (space group 166 or hR3). These ge- 
ometries are physically relevant, as they can be 
found in the Mn and S phase diagram respec- 
tively jisellm] . 

All other elements have been kept at the struc- 



ture previously optimized by VASP (Sec. Ill A) 



in order to conserve the large diversity of the 
input set. The GIF files for all crystals in this 
code benchmark set are available in the Supple- 
mentary Material [68] . 



B. Agreement between implementations 

The procedure mentioned above results in a 
large collection of numbers for each code (71 
elements x 4 properties). It is not conve- 
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meV 




code 1 



Figure 5. (Color online) The EOS parameters can 
differ significantly, while the E{V) curves them- 
selves are very similar. In that case the area be- 
tween the two functions is a better indicator of the 
overall deviation 



nicnt to compare them directly, however. Be- 
cause of the different units involved, a coher- 
ent approach would require the use of rela- 
tive deviations. Tabs. lUllIII on the other hand 
show that each property corresponds to a dif- 
ferent magnitude of relative error. This scale is 
mainly determined by the computational proce- 
dure and therefore does not alter substantially 
when shifting from a code-experiment compari- 
son to an intercode assessment. A single numer- 
ical error value, expressing the difference be- 
tween two particular DFT methods by means 
of one number, can be obtained by applying 
a weighed average. As all properties of inter- 
est depend on the equation of state, it is most 
straightforward to compare the E{V) curves 
produced by different approaches directly. The 
dispersion-governed compounds illustrate this 
strategy well. Since their E{V) curves are very 
shallow, small deviations in the bulk modulus 
will inflate the relative error considerably. How- 
ever, the equations of state as such can be very 
similar, the two curves at no point differing by 
more than a few meV per atom (Fig. [5|. For 
that reason a numerical error estimate A is de- 
fined as follows: 



A 



l[AE^V)dV 



(15) 



In other words, the rms energy difference be- 
tween the E{V) curves of these particular pro- 
grams is averaged over all elemental crystals. A 
hence provides an intuitive measure of the en- 
ergy distance between equations of state. 
Because different codes sometimes employ dif- 
ferent reference energies Eq, depending on the 



concept, all equations of state are set to zero at 
their equilibrium volume. An alternative solu- 
tion would entail the calculation of cohesive en- 
ergies, in order to provide a common reference 
for the equilibrium energy. However, not all 
programs allow for an easy manipulation of the 
electronic configuration of atoms. Moreover, 
the computational load would increase consid- 
erably. 

The computation of A can be automated quite 
easily. The fitted Birch-Murnaghan equation 
allows Eq. (151 to be written in an analytical 



form. Only Vq, Bq, and Bi are then needed for 
both codes under investigation. The resulting 
expressions have been added in the appendix 
for convenience. The WIEN2k data necessary 
for a code comparison have been provided in 
the Supplementary Material |;68,. 
The interval of integration is linked to the refer- 
ence data. In view of how the E{V) parameters 
are determined, the intercode difference is to be 
integrated between Vo,vF/EAr2/c±6 %. AV hence 
corresponds to 0.12VQ^wiEN2k- By definition 
A (APW+lo) ^^jgp^2k) becomes zero. 
The rms energy differences between the equa- 
tions of state predicted by APW-|-lo(wiEN2k) 



and PAW(vASP) 
PAWf 



or APW-|-lo(wiEN2k) and 
'(GPAW)i are represented in Tab. IIXI They 
show that the most critical elements are char- 
acterized by approximately half-filled d lev- 
els. Numerical errors can amount to up 
to 8.3meV/atom for PAW(vasp) (Tc) and 
20.9meV/atom for PAW(gpaw) (Ru). This 
agrees with physical intuition, because these 
crystals are among the least compressible. 
Their equations of state are very steep, and rel- 
atively small modifications of the parameters 
can strongly change the energy. The least sen- 
sitive elements are for the same reason located 
near the alkali metals and the noble ga ses (0 - 
0.7meV/atom numerical error) (see Tab. VIII). 



Only in comparison to experiment the latter 
group of materials stands out, but this is be- 
cause PBE grossly overestimates the rare gas 
volumes. 



When averaging the numbers in Tab. IX over 
all elements, the numerical error of each DFT 
approach can be determined for the given 
set of recommended PAW potentials. A 
is 1.8meV/atom for PAW(vasp)i while for 
PAW(GPAW) it is 3.3meV/atom. This agree- 
ment between implementations is an order of 
magnitude better than the difference with ex- 
perimental results. To show this, a similar en- 
ergy difference between DFT-PBE and experi- 
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Figure 6. (Color online) Intrinsic (PBE re- 
gression versus experiment) and numerical errors 
(PAW(vASP) versus APW+lo(wiEN2k)) for the equi- 
librium volume of the ground-state elemental crys- 
tals, using the subsets of elements that have been 
shown to perform well for PBE (see Tab. |IV[ ). A nor- 
mal distribution has been fitted to both data sets 
(dotted line) 



ment is computed. It uses experimental values 
as the reference situation, while the method un- 
der test is the full-fledged version of PAW(vasp) ■ 
This means that the E{V) parameters have 
been taken from Tabs.HIland lllll The deviations 
per element are presented in Tab. |IX[ leading 
up to a A-factor of 23.5 meV/atom. This differ- 
ence in magnitude can also be observed with the 
E{V) characteristics themselves. Fig. [6] shows 
the distribution of volume errors between two 
codes and with respect to experiment. Again, 
the spread is much larger in the latter case. 

A(PAW)(vASP) does not change noticeably 
when the number of elements is reduced to that 
of PAW(GPAW)- This shows that GPAW and 



Table VIII. Comparison between codes for two ex- 
treme situations: large (Tc an Ru) and small (Ar) 
numerical errors A^. Vo is given in A'^/atom, Bq in 
GPa, and Ai in meV/atom. Bi is dimensionless 







Vo 


Bo 


Bi 


A, 


Tc 


APW+l0(wiEN2k) 


14.47 


301.4 


4.56 







PAW(VASP) 


14.60 


298.5 


4.55 


8.3 


Ru 


APW-|-l0(wiEN2k) 


13.81 


315.4 


4.96 







PAW(GPAW) 


14.09 


310.9 


4.87 


20.9 


Ar 


APW-|-l0(wiEN2k) 


52.21 


0.7 


7.84 







PAW(vASP) 


52.65 


0.8 


7.35 


0.1 




PAW(GPAW) 


52.66 


0.8 


3.27 


0.1 



VASP, while both using the same PAW method, 
do not produce entirely identical results. This 
variation most likely originates in the different 
quality of the atomic potentials and the differ- 
ent type of basis functions used. However, in 
comparison to experiment, the differences are 
negligible. The intrinsic residual error bars and 
regression slopes provided in Tab. |IV| can there- 
fore be applied to DFT-PBE results, irrespec- 
tive of which approach was used to calculate 
them. 

This comparison of three DFT implementations 
can easily be extended. Ideally every solid-state 
DFT approach should be tested in the same 
way, and have its A-value computed. As such 
tests are preferably performed by specialists in 
the individual codes, all input GIF files have 
been made available in the Supplementary Ma- 
terial [68], as well as some ready-made post- 
processing scripts. In addition, the ASE devel- 
opers have implemented a framework for per- 
forming the necessary calculations jl33l 1138] . 
On the CMM website |139] an updated overview 
will be maintained of all A-factors reported to 
us. Such information not only provides insight 
into the reproducibility of the intrinsic errors of 
Tab. |IV[ but can also guide users to select a 
method for a specific task, at least as far as ac- 
curacy of energy- versus- volume relations is con- 
cerned. 



V. CONCLUSIONS 

Using the ground-state elemental crystals as a 
test set, DFT-PBE computational errors have 
been reviewed. Errors intrinsic to the functional 
were quantified for five materials properties, de- 
scribing energetic (AEcoh) and elastic (Vq, -Bq, 
Bi, Cij) quantities. They explain the deviation 
of DFT predictions from experiment. Numer- 
ical errors, due to the implementation of the 
DFT scheme into a computer code, were stud- 
ied for the PAW method (VASP and GPAW), 
and were expressed with respect to the reference 
APW-l-lo method (WIEN2k). Both types of er- 
rors have been discussed for PBE, one of the 
most widely applied functionals in solid-state 
DFT. The results are expected to be represen- 
tative of GGA in general. 

Each of the five properties has been assessed 
with respect to the ground-state elemental crys- 
tals. The correspondence to experiment has 
been analyzed statistically, leading to a decom- 
position of the intrinsic error into systematic 
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Table IX. (Color online) Rms energy differences Ai between the equations of state predicted by 
APW+lo(wiEN2k) and PAW(vasp) (green), APW+lo(wiEN2k) and PAW(gpaw) (red), and experiment and 
PAW(vASP) (blue) for the ground-state elemental crystals. All values are expressed in meV per atom. The 
darkest shades correspond to the largest errors. The average numerical error A is shown for each code at 
the header of the table 
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deviations and residual error bars. These in- 
trinsic errors have been shown to agree with 
some generaUy known GGA traits. The typi- 
cal underbinding of GGA has been reproduced 
and quantified, for example. Tab. [X] presents 
a summary of our results, as well as a similar 
analysis for other properties that are available 
from data sets in the literature. Contrary to 
Tab. |IV[ however, Tab. |X] presents systematic 
deviations in terms of the experimental value 
{{xth - Xexp)/xexp = 1/^ " !)■ This exprcsses 
more intuitively to what extent DFT varies from 
experiment: a — 1 of -1-1% for example 
means that PBE overestimates the experimen- 
tal result by 1 %. 

Based on the quantification of intrinsic errors, a 
computational recipe has been presented which 
allows to correct bare DFT-PBE results for 
the systematic deviation from experiment, and 
which attaches meaningful error estimates to 
the obtained predictions. (Tab. [v]). An exam- 
ination of 31 binary compounds not included 
in the benchmark set [34j [35] indicate that our 
analysis carries over to multicomponent crys- 
tals. Errors can hence be estimated straight- 
forwardly for PBE predictions already available 
from literature. 

The overall agreement between VASP-PBE 
and experiment is quite good, but some sub- 
sets of elements perform better than others. 
DFT predictions for magnetic materials and 
correlation-dominated compounds deviate sig- 
nificantly from experimental values, for exam- 
ple, especially with respect to the cohesive en- 
ergy. Long-range interaction is another is- 
sue. Although some solutions exist to incorpo- 
rate London dispersion into DFT, such as the 
DFT-D [m] or vdW-DF2 method ir3i, regu- 
lar GGAs do not describe dispersion-governed 
crystal types well. Bulk moduli are found to be 
consistently underestimated, while predictions 
for both the volume and the pressure deriva- 
tive of the bulk modulus are systematically too 
large. Results for heavy elements are acceptable 
as long as spin-orbit coupling is added, starting 
from the 6p block. Based on these observations, 
some general guidelines have been summarized 
in Tab.pC|as to what categories of materials will 
not be described well. Some classes were not 
or only marginally represented in the elemental 
benchmark set, such as ionic or strongly cor- 
related compounds. For these, a similar study 
using an extended benchmark set, by includ- 
ing some binary ionic compounds and transition 
metal oxides, would be useful. 
All conclusions with respect to the intrin- 



sic PBE errors can only be universally ap- 
plicable when it does not matter how the 
DFT formalism is implemented. Such nu- 
merical errors should be much smaller than 
intrinsic ones. By means of a quality fac- 
tor A, which conveys exactly this informa- 
tion, APW+lo(wiEN2k) has been compared 
to PAW(vASP) = 1.8meV/atom) and 

PAW(GPAw) = 3.3meV/atom), both for 
their recommended sets of atomic potentials. 
The rms energy distance between equations of 
state from different methods indeed appears 
to be an order of magnitude smaller than the 
gap between theory and experiment (see also 
Tab. |x]). The intrinsic systematic deviations 
and residual error bars presented in Tab. [X| can 
hence be applied to PBE predictions regardless 
of the computational approach. This is useful 
when discussing the implications of DFT results 
in an experimental context. 
This accuracy review is to be considered as a 
starting point only. The presented statistical 
procedure is applicable to other functionals or 
methods as well. It would be useful to de- 
termine the intrinsic systematic deviations and 
residual error bars for e.g. LDA or hybrid func- 
tionals. A comparison to results from high- 
level many-body techniques would even allow 
to eliminate the influence of experimental er- 
rors. Another extension would be to take into 
account experimental error bars in the statisti- 
cal analysis. For the relevant properties of the 
elemental materials, this requires an extensive 
literature search for the most accurately known 
values and their error bars. These error bars 
are not commonly available in tabulations in the 
literature and are therefore beyond the scope of 
this work. With respect to the assessment of 
numerical errors, we invite both code develop- 
ers and users to determine the quality factor A 
for their code as well. It not only guarantees 
the transferability of the intrinsic errors to all 
codes, but also provides a criterion to evaluate 
the accuracy of a particular DFT approach. A 
comprehensive list of A-factors |139| can then 
serve as a guideline through the maze of avail- 
able DFT methods. 
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Table X. Systematic deviations 1//3 — 1 and residual error bars for DFT-GGA predictions compared with 
experiment (intrinsic errors) and between codes (numerical errors). A represents the average rms energy 
difference between the equations of states of two codes. All data were (re)analyzed in the present study, 
except for the error bar for AEevoi, which is based on a proof of principle (see Sec. 
materials to which the error estimates do not apply are mentioned explicitly 
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APPENDIX: CALCULATING THE 
A-FACTOR 

For a particular compound, the energy differ- 
ence between the equations of state of WIEN2k 
(w) and a code under investigation (c) can 
be evaluated analytically, using the Birch- 
Murnaghan relation of Eq. ([2]). Some mathe- 



matical manipulations yield 

/ iE-iV)~E^{V)f dV^F{Vf)~F{V,) 

(A.16) 

where the primitive function F{V) can be writ- 
ten as a power series in V^^^"^: 

4 

F{V) = (A.17) 
The coefficients Xn are given by 



2n + 1 ^-^ ■' ■' 

i+j=n+2 

(A.18) 

with i,j e {0,1,2,3}. Then a;_i for example 
becomes 



x-i = 6{a1 - a'i'){a^o - a^) (A.19) 

The constants are the coefficients of the 
Birch-Murnaghan equation in its polynomial 
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fori 



03 = 
02 = 
ai 

flo 



^0^0 

16 

,7/3 



(Si - 4) 



16 



16 
9VoBo 

16 



(14-3Bi) 



(3Bi - 16) 



(6 -Si) 



(A.20) 

(A.21) 

(A.22) 
(A.23) 



When evaluating Eq. (A. 18 1, stands for the 
coefficient of the code under test, while a™ 
means it corresponds to the WIEN2k equation 
of state. 
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